Biostatistics For Dummies (Monika Wahi John Pezzullo)

covered in Chapter 24). So the test statistic from this test should follow the chi-square distribution.

Now it is obvious why it is named the chi-square test! The next step is to obtain the p value for the test

statistic. To do that manually, you would look up the test statistic (which is 8.81 in our case) in a chi-

square table.

In actuality, the chi-square distribution refers to a family of distributions. Which chi-square

distribution you are using depends upon a number called the degrees of freedom, abbreviated d.f.

or df or by the Greek lowercase letter nu (v) (in this book we use df). The df is a measure of the

probability of independence between the value of the predictor (row) variable and value of the

column (outcome) variable.

How would you calculate the df for a chi-square test? The answer is it depends on the number of rows

in the cross-tab. For the

cross-tab (fourfold table) in this example, you added up the four values

in Figure 12-5, so you may think that you should look up the 8.81 chi-square value with 4 df. But you’d

be wrong. Note the italicized word independence in the preceding paragraph. And keep in mind that

the differences (

) in any row or column always add up to zero. The four terms making up the

8.81 total aren’t independent of each other. It turns out that the chi-square test statistic for a fourfold

table has only 1 df, not 4. In general, an N-by-M table, with N rows, M columns, and therefore

cells, has only

df because of the constraints on the row and column sums. In our case,

N — which is the number of rows — is 2, so N-1 is 1. Also, M — which is the number of columns —

is 2, so M-1 is 1 also (and 1 times 1 is 1). Don’t feel bad if this wrinkle caught you by surprise —

even Karl Pearson who invented the chi-square test got that part wrong!

So, if you were to manually look up the chi-square test statistic of 8.81 in a chi-square table, you

would have to look under the distribution for 1 df to find out the p value. Alternatively, if you got this

far and you wanted to use the statistical software R to look up the p value, you would use the following

code: pchisq(8.81, 1, lower.tail = FALSE). Either way, the p value for chi-square = 8.81, with 1 df, is

0.003. This means that there’s only a 0.003 probability that random fluctuations could produce the

effect seen, where CBD performs so differently than NSAIDs with respect to pain relief in chronic

arthritis patients. A 0.003 probability is the same as 1 chance in 333 (because

), meaning

very unlikely, but not impossible. So, if you set α = 0.05, because 0.003 < 0.05, your conclusion would

be that in the chronic arthritis patients in our sample, whether the participant took CBD or NSAIDs

was statistically significantly associated with whether or not they felt pain relief.

Putting it all together with some notation and formulas

The calculations of the Pearson chi-square test can be summarized concisely using the cell-

naming conventions shown in Figure 12-6, along with the standard summation notation described

in Chapter 2.